Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: properly closing tcp connections in sdk to fix timeout errors #44

Merged
merged 2 commits into from
Nov 28, 2023

Conversation

Bryce-Soghigian
Copy link
Collaborator

Description
We will occasionally get an error like:

{"name":"sm-general-purpose-vdnt7"},"namespa
ce":"","name":"sm-general-purpose-vdnt7","reconcileID":"11d07c7c-6c56-4d0a-b49f-50393126d7a9","error":"launching nodeclaim, creating instanc
e, Get \"https://management.azure.com/subscriptions/(redacted)/providers/Microsoft.Network/locations/eastus/operat
ions/(redacted)?api-version=2022-01-01\": dial tcp 20.62.130.212:443: i/o timeout"}

This is caused by an issue in the go std library Ref
When a TCP connection is dropped, there is a bug in the http2 implementation in go that misses to mark the connection as failed. as a result, the connection stays in the pool, and the calls using that connection hit a context deadline.
The connection can stay in the pool fur up to 17 min (with default linux kernel settings).

This PR updates our http client to share its configuration with the AKS Recommended Default middleware provided in the azure-sdk-for-go-extensions project that fixes the pool getting filled up.

How was this change tested?

  • unit tests in the azure-sdk-for-go-extensions
  • proven to fix issue in prod via various aks services that have adopted these client options

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

Release Note

Fixing TCP Timeout Errors from stale connections in the pool 

Copy link
Collaborator Author

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

@Bryce-Soghigian
Copy link
Collaborator Author

See PR Adding this update to the middleware: Azure/azure-sdk-for-go-extensions#29

@Bryce-Soghigian
Copy link
Collaborator Author

Screenshot 2023-11-22 at 8 46 23 PM Coveralls is failing because i removed code in a file with 100% coverage so it lowered overall percentage

Copy link
Collaborator Author

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

@Bryce-Soghigian Bryce-Soghigian marked this pull request as ready for review November 23, 2023 04:51
@tallaxes tallaxes merged commit d4c3543 into main Nov 28, 2023
8 checks passed
@tallaxes tallaxes deleted the bsoghigian/fixing-tcp-drop-bug branch November 28, 2023 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants